0. The Convolutional Neural Network (CNN) architecture

Motivation

Our Multi-Layer Perceptron (MLP) works quite well on the MNIST dataset. What need is there for a new architecture ?

The problem with the MLP is that it does not take the image structure into account. Indeed, the input data, an array with 784 entries, could very well be shuffled (entry 0 becomes entry 26, entry 26 becomes entry 54 and so on), it would not matter to the MLP (try it !).

This means that the two images below are essentially the same for our neural network!
(Going from left to right, we applied a random permutation of the grayscale values of this MNIST image).

Clearly, our architecture is a waste of information. In addition, it is computationally expensive as our network is fully connected : for the MNIST dataset, it is still feasible, but for higher resolution images, learning all the weights and biases quickly becomes prohibitive.

We will now see how a cleverer architecture can be used to efficiently tackle image classification.

Architecture

Restrict connections

A simple solution to the computation cost is to restrict connections between units. We will now connect each unit in the hidden layer to a small patch of contiguous pixels from the input image.

This is in close analogy to the way visual systems work: neurons in the visual cortex have localized receptive fields and will only respond to stimuli from a given location.

Use shared weights

We will also make use of shared weights. By shared weights, we mean that we apply the same transformation for all subpatch of the input image. This ensures that every unit in the hidden layer detects exactly the same pattern, albeit at different locations of the input image.

This design choice is desirable because natural images have the property of being ”‘stationary”’. For instance, if we find an operation that allows us to detect a stick in one part of the image, we should be able to detect a stick elsewhere by applying the same operation to other patches of the image.

Shared weights also immediately decrease the number of parameters to be learnt and thus the computational cost.

The way Convolutional Networks operate is best explained visually :

In the GIF above, the green array is an 8x8 schematic input image, with cells indicating pixels and cell values indicating color intensity. The pink array is a 3x3 hidden layer.

Each unit in the hidden layer is connected to a patch of the input image (the sliding 3x3 orange window). The values in each cell of the hidden layers are computed with the same operation (i.e. the weights of the sliding orange window do not change).

For instance, the first unit of the hidden layer is connected to the following patch of the input image:

$$P= \begin{bmatrix} 1 & 1 & 1 \\ 0 & 1 & 1 \\ 0 & 0 & 1 \end{bmatrix}$$

To compute the value of the first unit of the hidden layer, we do :

\begin{eqnarray} \rm value=\rm Sum(P\odot W)&= 1\times 1 + 1\times 0 +1\times 1 \\ &+ 0\times 0 + 1\times 1 + 1\times 0 \\ &+ 0\times 1 + 0\times 0 + 1\times 1 \\ & = 4 \tag{1}\end{eqnarray}

where W, the weight matrix is defined as:

$$W= \begin{bmatrix} 1 & 0 & 1 \\ 0 & 1 & 0 \\ 1 & 0 & 1 \end{bmatrix}$$

Of course, we won't restrict ourselves to a single weight matrix (also called kernel or filter matrix). Indeed, we don't want to restrict ourselves to sticks only, we may be interested in detecting stars or ellipses. This is why we will use several weight matrices at this stage, as shown on the image below :

In this example, we apply 3 different 5x5 weight matrices to the input MNIST data, obtaining convolved features of size 3 x (28-5+1) * (28-5+1)

As an aside, note that the operation Eq. (1) is a form of convolution hence the name of the architecture.

Pooling

It is possible to use all the features in the hidden layer. However, this can lead to computational issues. Let us give an illustratory example :

  • Assume images of 96 x 96 pixels
  • We use an 8 x 8 sliding window
  • This gives 7921 units in the hidden layer
  • If we assume we have 400 weight matrices, this yields a vector of 7921*400 = 3,168,400
  • This large feature vector is clearly unwieldy !

To address this, we condense the information with a so-called pooling layer. For instance, we could take the mean of neighboring cells in the hidden layer and obtain a much lower dimension feature vector.

This operation can greatly reduce the computational burden and also address overfitting issues. The figure below shows how this works in practice :

The hidden layer units are divided into 4 groups (highlighted by the sliding red window). Then we take the mean (or the max, or whatever summary statistics we choose) of each group. This yields a much lower dimension vector.

Full architecture

Combining all that we've seen above, we get the following architecture :

  • Input layer (the 28 x 28 MNIST image)
  • Hidden layer of dimension 3 x 24 x 24 (obtained with 3 filter matrices and a 5 x 5 sliding window)
  • A pooling layer of dimension 3 x 12 x12 (obtained after taking the max in 2 x 2 windows of the hidden layer)
  • The final layer is a fully connected layer which connects all pooled feature to the 10 output units

N.B. The size of the convolution filter (5x5 in this example) is a hyperparameter, as well as the stride length (which states by how many pixels we slide the window). They can and should be tuned.

1. Theano

We will now use Theano to build our neural network.

Description

From Wikipedia :

Theano is a numerical computation library for Python. In Theano, computations are expressed using a NumPy-like syntax and compiled to run efficiently on either CPU or GPU architectures.

In case you do have a GPU on your machine, Theano can greatly speed up computations for Deep Neural Networks. In case you don't, you can run a GPU instance on the Amazon Web Services

Installation

An easy way to install theano and its dependencies is to use the Anaconda package by continuum analytics.

To do so, we download the latest package from the Anaconda website. For Linux and to this date (January 2016), the command is :

wget https://3230d63b5fc54e62148e-c95ac804525aac4b6dba79b00b39d1d3.ssl.cf1.rackcdn.com/Anaconda2-2.4.1-Linux-x86_64.sh

Then in the directory were it was downloaded :

bash Anaconda2-2.4.1-Linux-x86_64.sh

Or

Install

http://docs.continuum.io/anaconda/install

Update to latest version

conda update conda

conda update ipython ipython-notebook ipython-qtconsole

pip install Theano

You should now be able to use Theano with python. All the code below can be ran without a GPU. However, activating the GPU may greatly speed up the examples.


In [1]:
"""
This program makes use of the Theano library to speed up computations.

Besides speeding up computations through the use of a GPU,
Theano also automatically computes the mappings required for the
backpropagation algorithm

This program also implements dropout 
(a method in which you randomly deactivate some units inside the neural network)
this helps regularisation and performance
But you should pay it no heed in this tutorial

"""

#### Libraries
# Standard library
import cPickle
import gzip

# Third-party libraries
import numpy as np
import theano
import theano.tensor as T
from theano.tensor.nnet import conv
from theano.tensor.nnet import softmax
from theano.tensor import shared_randomstreams
from theano.tensor.signal import downsample

# Activation functions for neurons
from theano.tensor.nnet import sigmoid
from theano.tensor import tanh


#### Constants
GPU = False
if GPU:
    print "Running under a GPU"
    try: theano.config.device = 'gpu'
    except: pass # it's already set
    theano.config.floatX = 'float32'
else:
    print "Running with a CPU"

#### Load the MNIST data
def load_data_shared(filename="../data/mnist.pkl.gz"):
    f = gzip.open(filename, 'rb')
    training_data, validation_data, test_data = cPickle.load(f)
    f.close()

    
    # We store the data in so-called shared-variables
    #The reason we store our dataset in shared variables is to allow
    #Theano to copy it into the GPU memory (when code is run on GPU).
    #Since copying data into the GPU is slow, copying a minibatch everytime
    #is needed (the default behaviour if the data is not in a shared
    #variable) would lead to a large decrease in performance.
    def shared(data):
        """Place the data into shared variables.  This allows Theano to copy
        the data to the GPU, if one is available.

        """
        shared_x = theano.shared(
            np.asarray(data[0], dtype=theano.config.floatX), borrow=True)
        shared_y = theano.shared(
            np.asarray(data[1], dtype=theano.config.floatX), borrow=True)
        return shared_x, T.cast(shared_y, "int32")
    return [shared(training_data), shared(validation_data), shared(test_data)]

#### Main class used to construct and train networks
class Network(object):

    def __init__(self, layers, mini_batch_size):
        """Takes a list of `layers`, describing the network architecture, and
        a value for the `mini_batch_size` to be used during training
        by stochastic gradient descent.

        """
        self.layers = layers
        self.mini_batch_size = mini_batch_size
        self.params = [param for layer in self.layers for param in layer.params]
        # Theano symbolic variables
        self.x = T.matrix("x")
        self.y = T.ivector("y")
        init_layer = self.layers[0]
        init_layer.set_inpt(self.x, self.x, self.mini_batch_size)
        for j in xrange(1, len(self.layers)):
            prev_layer, layer  = self.layers[j-1], self.layers[j]
            layer.set_inpt(
                prev_layer.output, prev_layer.output_dropout, self.mini_batch_size)
        self.output = self.layers[-1].output
        self.output_dropout = self.layers[-1].output_dropout

    def SGD(self, training_data, epochs, mini_batch_size, eta,
            validation_data, test_data, lmbda=0.0):
        """Train the network using mini-batch stochastic gradient descent."""
        training_x, training_y = training_data
        validation_x, validation_y = validation_data
        test_x, test_y = test_data

        # compute number of minibatches for training, validation and testing
        num_training_batches = size(training_data)/mini_batch_size
        num_validation_batches = size(validation_data)/mini_batch_size
        num_test_batches = size(test_data)/mini_batch_size

        # define the (regularized) cost function, symbolic gradients, and updates
        l2_norm_squared = sum([(layer.w**2).sum() for layer in self.layers])
        # set up the (regularised) log likelihood function
        cost = self.layers[-1].cost(self)+\
               0.5*lmbda*l2_norm_squared/num_training_batches
        # compute the gradient of our cost function
        grads = T.grad(cost, self.params)
        updates = [(param, param-eta*grad)
                   for param, grad in zip(self.params, grads)]

        # define functions to train a mini-batch, and to compute the
        # accuracy in validation and test mini-batches.
        i = T.lscalar() # mini-batch index
        #train_mb is a symbolic theano function
        train_mb = theano.function(
            [i], cost, updates=updates,
            givens={
                self.x:
                training_x[i*self.mini_batch_size: (i+1)*self.mini_batch_size],
                self.y:
                training_y[i*self.mini_batch_size: (i+1)*self.mini_batch_size]
            })
        validate_mb_accuracy = theano.function(
            [i], self.layers[-1].accuracy(self.y),
            givens={
                self.x:
                validation_x[i*self.mini_batch_size: (i+1)*self.mini_batch_size],
                self.y:
                validation_y[i*self.mini_batch_size: (i+1)*self.mini_batch_size]
            })
        test_mb_accuracy = theano.function(
            [i], self.layers[-1].accuracy(self.y),
            givens={
                self.x:
                test_x[i*self.mini_batch_size: (i+1)*self.mini_batch_size],
                self.y:
                test_y[i*self.mini_batch_size: (i+1)*self.mini_batch_size]
            })
        self.test_mb_predictions = theano.function(
            [i], self.layers[-1].y_out,
            givens={
                self.x:
                test_x[i*self.mini_batch_size: (i+1)*self.mini_batch_size]
            })
        # Do the actual training
        best_validation_accuracy = 0.0
        for epoch in xrange(epochs):
            for minibatch_index in xrange(num_training_batches):
                iteration = num_training_batches*epoch+minibatch_index
                if iteration % 1000 == 0:
                    print("Training mini-batch number {0}".format(iteration))
                cost_ij = train_mb(minibatch_index)
                if (iteration+1) % num_training_batches == 0:
                    validation_accuracy = np.mean(
                        [validate_mb_accuracy(j) for j in xrange(num_validation_batches)])
                    print("Epoch {0}: validation accuracy {1:.2%}".format(
                        epoch, validation_accuracy))
                    if validation_accuracy >= best_validation_accuracy:
                        print("This is the best validation accuracy to date.")
                        best_validation_accuracy = validation_accuracy
                        best_iteration = iteration
                        if test_data:
                            test_accuracy = np.mean(
                                [test_mb_accuracy(j) for j in xrange(num_test_batches)])
                            print('The corresponding test accuracy is {0:.2%}'.format(
                                test_accuracy))
        print("Finished training network.")
        print("Best validation accuracy of {0:.2%} obtained at iteration {1}".format(
            best_validation_accuracy, best_iteration))
        print("Corresponding test accuracy of {0:.2%}".format(test_accuracy))

#### Define layer types

class ConvPoolLayer(object):
    """Used to create a combination of a convolutional and a max-pooling
    layer.  A more sophisticated implementation would separate the
    two, but for our purposes we'll always use them together, and it
    simplifies the code, so it makes sense to combine them.

    """

    def __init__(self, filter_shape, image_shape, poolsize=(2, 2),
                 activation_fn=sigmoid):
        """`filter_shape` is a tuple of length 4, whose entries are the number
        of filters, the number of input feature maps, the filter height, and the
        filter width.

        `image_shape` is a tuple of length 4, whose entries are the
        mini-batch size, the number of input feature maps, the image
        height, and the image width.

        `poolsize` is a tuple of length 2, whose entries are the y and
        x pooling sizes.

        """
        self.filter_shape = filter_shape
        self.image_shape = image_shape
        self.poolsize = poolsize
        self.activation_fn=activation_fn
        # initialize weights and biases
        n_out = (filter_shape[0]*np.prod(filter_shape[2:])/np.prod(poolsize))
        self.w = theano.shared(
            np.asarray(
                np.random.normal(loc=0, scale=np.sqrt(1.0/n_out), size=filter_shape),
                dtype=theano.config.floatX),
            borrow=True)
        self.b = theano.shared(
            np.asarray(
                np.random.normal(loc=0, scale=1.0, size=(filter_shape[0],)),
                dtype=theano.config.floatX),
            borrow=True)
        self.params = [self.w, self.b]

    def set_inpt(self, inpt, inpt_dropout, mini_batch_size):
        self.inpt = inpt.reshape(self.image_shape)
        conv_out = conv.conv2d(
            input=self.inpt, filters=self.w, filter_shape=self.filter_shape,
            image_shape=self.image_shape)
        pooled_out = downsample.max_pool_2d(
            input=conv_out, ds=self.poolsize, ignore_border=True)
        self.output = self.activation_fn(
            pooled_out + self.b.dimshuffle('x', 0, 'x', 'x'))
        self.output_dropout = self.output # no dropout in the convolutional layers

        
        

class FullyConnectedLayer(object):

    def __init__(self, n_in, n_out, activation_fn=sigmoid, p_dropout=0.0):
        self.n_in = n_in
        self.n_out = n_out
        self.activation_fn = activation_fn
        self.p_dropout = p_dropout
        # Initialize weights and biases
        # use gaussian with proper width to initialise
        # we use shared variables to limit computation overhead
        # between CPU and GPU
        self.w = theano.shared(
            np.asarray(
                np.random.normal(
                    loc=0.0, scale=np.sqrt(1.0/n_out), size=(n_in, n_out)),
                dtype=theano.config.floatX),
            name='w', borrow=True)
        self.b = theano.shared(
            np.asarray(np.random.normal(loc=0.0, scale=1.0, size=(n_out,)),
                       dtype=theano.config.floatX),
            name='b', borrow=True)
        self.params = [self.w, self.b]

    def set_inpt(self, inpt, inpt_dropout, mini_batch_size):
        self.inpt = inpt.reshape((mini_batch_size, self.n_in))
        self.output = self.activation_fn(
            (1-self.p_dropout)*T.dot(self.inpt, self.w) + self.b)
        self.y_out = T.argmax(self.output, axis=1)
        self.inpt_dropout = dropout_layer(
            inpt_dropout.reshape((mini_batch_size, self.n_in)), self.p_dropout)
        self.output_dropout = self.activation_fn(
            T.dot(self.inpt_dropout, self.w) + self.b)

    def accuracy(self, y):
        "Return the accuracy for the mini-batch."
        return T.mean(T.eq(y, self.y_out))

class SoftmaxLayer(object):

    def __init__(self, n_in, n_out, p_dropout=0.0):
        self.n_in = n_in
        self.n_out = n_out
        self.p_dropout = p_dropout
        # Initialize weights and biases
        self.w = theano.shared(
            np.zeros((n_in, n_out), dtype=theano.config.floatX),
            name='w', borrow=True)
        self.b = theano.shared(
            np.zeros((n_out,), dtype=theano.config.floatX),
            name='b', borrow=True)
        self.params = [self.w, self.b]

    def set_inpt(self, inpt, inpt_dropout, mini_batch_size):
        self.inpt = inpt.reshape((mini_batch_size, self.n_in))
        self.output = softmax((1-self.p_dropout)*T.dot(self.inpt, self.w) + self.b)
        self.y_out = T.argmax(self.output, axis=1)
        self.inpt_dropout = dropout_layer(
            inpt_dropout.reshape((mini_batch_size, self.n_in)), self.p_dropout)
        self.output_dropout = softmax(T.dot(self.inpt_dropout, self.w) + self.b)

    def cost(self, net):
        "Return the log-likelihood cost."
        return -T.mean(T.log(self.output_dropout)[T.arange(net.y.shape[0]), net.y])

    def accuracy(self, y):
        "Return the accuracy for the mini-batch."
        return T.mean(T.eq(y, self.y_out))


#### Miscellanea
def size(data):
    "Return the size of the dataset `data`."
    return data[0].get_value(borrow=True).shape[0]

def dropout_layer(layer, p_dropout):
    srng = shared_randomstreams.RandomStreams(
        np.random.RandomState(0).randint(999999))
    mask = srng.binomial(n=1, p=1-p_dropout, size=layer.shape)
    return layer*T.cast(mask, theano.config.floatX)


WARNING (theano.configdefaults): g++ not detected ! Theano will be unable to execute optimized C-implementations (for both CPU and GPU) and will default to Python implementations. Performance will be severely degraded. To remove this warning, set Theano flags cxx to an empty string.
WARNING:theano.configdefaults:g++ not detected ! Theano will be unable to execute optimized C-implementations (for both CPU and GPU) and will default to Python implementations. Performance will be severely degraded. To remove this warning, set Theano flags cxx to an empty string.
Running with a CPU
C:\Anaconda2\lib\site-packages\theano\tensor\signal\downsample.py:6: UserWarning: downsample module has been moved to the theano.tensor.signal.pool module.
  "downsample module has been moved to the theano.tensor.signal.pool module.")

2. Application example

Multi layer perceptron benchmark

  • An input layer
  • One hidden layer
  • A softmax output layer
  • The learning rate $\eta$ is set to 0.1, the mini batch size to 10 and we train for 60 epochs.

The accuracy should reach somewhere close to 97.8 % at best.

N.B It takes some time to complete the full training, feel free to stop anytime. N.B. If you have a GPU, feel free to activate it (set the GPU flag to True in the code), it should speed up computations !


In [ ]:
training_data, validation_data, test_data = load_data_shared()
mini_batch_size = 10
net = Network([
        FullyConnectedLayer(n_in=784, n_out=100),
        SoftmaxLayer(n_in=100, n_out=10)], mini_batch_size)
net.SGD(training_data, 60, mini_batch_size, 0.1, 
            validation_data, test_data)

Switching to a CNN

  • An input layer
  • One convolution layer (20 5 x 5 weight matrices
  • One pooled layer (takes the max in a 2x2 window of the convolution layer)
  • One fully connected (100 sigmoid units) layer
  • One softmax layer
  • The learning rate $\eta$ is set to 0.1, the mini batch size to 10 and we train for 60 epochs.

The accuracy should reach somewhere close to 98.8 % at best. While it does not seem much of an improvement, it actually means a 45% decrease in the error rate over the previous network !


In [47]:
net = Network([
        ConvPoolLayer(image_shape=(mini_batch_size, 1, 28, 28), 
                      filter_shape=(20, 1, 5, 5), 
                      poolsize=(2, 2)),
        FullyConnectedLayer(n_in=20*12*12, n_out=100),
        SoftmaxLayer(n_in=100, n_out=10)], mini_batch_size)
net.SGD(training_data, 60, mini_batch_size, 0.1, 
            validation_data, test_data)


Training mini-batch number 0
Training mini-batch number 1000
Training mini-batch number 2000
Training mini-batch number 3000
Training mini-batch number 4000
Epoch 0: validation accuracy 94.33%
This is the best validation accuracy to date.
The corresponding test accuracy is 93.80%
Training mini-batch number 5000
Training mini-batch number 6000
Training mini-batch number 7000
Training mini-batch number 8000
Training mini-batch number 9000
Epoch 1: validation accuracy 96.43%
This is the best validation accuracy to date.
The corresponding test accuracy is 96.16%
Training mini-batch number 10000
Training mini-batch number 11000
Training mini-batch number 12000
Training mini-batch number 13000
Training mini-batch number 14000
Epoch 2: validation accuracy 97.31%
This is the best validation accuracy to date.
The corresponding test accuracy is 97.09%
Training mini-batch number 15000
Training mini-batch number 16000
Training mini-batch number 17000
Training mini-batch number 18000
Training mini-batch number 19000
Epoch 3: validation accuracy 97.67%
This is the best validation accuracy to date.
The corresponding test accuracy is 97.55%
Training mini-batch number 20000
Training mini-batch number 21000
Training mini-batch number 22000
Training mini-batch number 23000
Training mini-batch number 24000
Epoch 4: validation accuracy 97.96%
This is the best validation accuracy to date.
The corresponding test accuracy is 97.87%
Training mini-batch number 25000
Training mini-batch number 26000
Training mini-batch number 27000
Training mini-batch number 28000
Training mini-batch number 29000
Epoch 5: validation accuracy 98.19%
This is the best validation accuracy to date.
The corresponding test accuracy is 97.99%
Training mini-batch number 30000
Training mini-batch number 31000
Training mini-batch number 32000
Training mini-batch number 33000
Training mini-batch number 34000
Epoch 6: validation accuracy 98.30%
This is the best validation accuracy to date.
The corresponding test accuracy is 98.15%
Training mini-batch number 35000
Training mini-batch number 36000
Training mini-batch number 37000
Training mini-batch number 38000
Training mini-batch number 39000
Epoch 7: validation accuracy 98.34%
This is the best validation accuracy to date.
The corresponding test accuracy is 98.23%
Training mini-batch number 40000
Training mini-batch number 41000
Training mini-batch number 42000
Training mini-batch number 43000
Training mini-batch number 44000
Epoch 8: validation accuracy 98.43%
This is the best validation accuracy to date.
The corresponding test accuracy is 98.28%
Training mini-batch number 45000
Training mini-batch number 46000
Training mini-batch number 47000
---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
<ipython-input-47-0720f25255d0> in <module>()
      6         SoftmaxLayer(n_in=100, n_out=10)], mini_batch_size)
      7 net.SGD(training_data, 60, mini_batch_size, 0.1, 
----> 8             validation_data, test_data) 

<ipython-input-45-030f307c9bf9> in SGD(self, training_data, epochs, mini_batch_size, eta, validation_data, test_data, lmbda)
    134                 if iteration % 1000 == 0:
    135                     print("Training mini-batch number {0}".format(iteration))
--> 136                 cost_ij = train_mb(minibatch_index)
    137                 if (iteration+1) % num_training_batches == 0:
    138                     validation_accuracy = np.mean(

/home/tmain/anaconda/lib/python2.7/site-packages/theano/compile/function_module.pyc in __call__(self, *args, **kwargs)
    593         t0_fn = time.time()
    594         try:
--> 595             outputs = self.fn()
    596         except Exception:
    597             if hasattr(self.fn, 'position_of_error'):

/home/tmain/anaconda/lib/python2.7/site-packages/theano/gof/op.pyc in rval(p, i, o, n)
    766             # default arguments are stored in the closure of `rval`
    767             def rval(p=p, i=node_input_storage, o=node_output_storage, n=node):
--> 768                 r = p(n, [x[0] for x in i], o)
    769                 for o in node.outputs:
    770                     compute_map[o][0] = True

/home/tmain/anaconda/lib/python2.7/site-packages/theano/tensor/raw_random.pyc in perform(self, node, inputs, out_)
    250             r = copy(r)
    251         rout[0] = r
--> 252         rval = self.fn(r, *(args + [shape]))
    253         if not isinstance(rval, numpy.ndarray) \
    254                or str(rval.dtype) != node.outputs[1].type.dtype:

KeyboardInterrupt: 

Appendix : producing the MNIST image and the shuffled version


In [37]:
# Let's start by importing relevant packages :
import cPickle as pickle
import subprocess
import os
import struct
import numpy as np
from matplotlib.gridspec import GridSpec
import matplotlib.pylab as plt
import matplotlib as mpl

# Define a function to unzip the mnist data and load it
def get_mnist_data():
    """ Unzip mnist data if needed and load it"""
    if not os.path.isfile("./Data/mnist.pkl"):
        subprocess.call("gunzip -k ./Data/mnist.pkl.gz".split(" "))
    with open("./Data/mnist.pkl", "r") as f:
        return pickle.load(f)

p = get_mnist_data()
print p[0][0][0].shape, p[0][1].shape
im, num =  p[0][0][0], p[0][1][0]
im = np.reshape(im, (28,28))
imr = im.copy()
# Shuffle a copy of the image
np.random.shuffle(imr)

fig = plt.figure()
gs = GridSpec(1, 2, bottom=0.18, left=0.18, right=0.88)

ax = fig.add_subplot(gs[0])
axshuff = fig.add_subplot(gs[1])

#Plot the first MNIST image in the data and its 
# shuffled version side by side.
image = ax.imshow(im, cmap=mpl.cm.Greys)
imageshuff = axshuff.imshow(imr, cmap=mpl.cm.Greys)
imageshuff.set_interpolation('nearest')
image.set_interpolation('nearest')

plt.savefig("Figures/rand_shuffle.png", bbox_inches='tight')


(784,) (50000,)

Figures courtesy of Matt Nielsen and Stanford University


In [ ]: